AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data
نویسندگان
چکیده
The unprecedented spread of location-aware devices has resulted in a plethora of location-based services in which huge amounts of spatial data need to be efficiently processed by large-scale computing clusters. Existing cluster-based systems for processing spatial data employ static data-partitioning structures that cannot adapt to data changes, and that are insensitive to the query workload. Hence, these systems are incapable of consistently providing good performance. To close this gap, we present AQWA, an adaptive and query-workload-aware mechanism for partitioning large-scale spatial data. AQWA does not assume prior knowledge of the data distribution or the query workload. Instead, as data is consumed and queries are processed, the data partitions are incrementally updated. With extensive experiments using real spatial data from Twitter, and various workloads of range and k-nearest-neighbor queries, we demonstrate that AQWA can achieve an order of magnitude enhancement in query performance compared to the state-of-the-art systems.
منابع مشابه
A Demonstration of AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data
The ubiquity of location-aware devices, e.g., smartphones and GPS devices, has led to a plethora of location-based services in which huge amounts of geotagged information need to be efficiently processed by large-scale computing clusters. This demo presents AQWA, an adaptive and query-workload-aware data partitioning mechanism for processing large-scale spatial data. Unlike existing cluster-bas...
متن کاملAmoeba: A Shape changing Storage System for Big Data
Data partitioning significantly improves the query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset for a given query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload upfront. Furthermore, workloads change over time as ...
متن کاملHAQWA: a Hash-based and Query Workload Aware Distributed RDF Store
Like most data models encountered in the Big Data ecosystem, RDF stores are managing large data sets by partitioning triples across a cluster of machines. Nevertheless, the graphical nature of RDF data as well as its associated SPARQL query execution model makes the efficient data distribution more involved than in other data models, e.g., relational. In this paper, we propose a novel system th...
متن کاملAn Adaptive Partitioning Scheme for Ad-hoc and Time-varying Database Analytics
Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. F...
متن کاملSpatial coding-based approach for partitioning big spatial data in Hadoop
Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle this problem, we proposed a spatial coding-based...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 8 شماره
صفحات -
تاریخ انتشار 2015